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[57] ABSTRACT 

A method and apparatus for translating a first address in a 
first address space, such as a processor address space, to a 
second address in a second address space, such as a system 
address space, are described. Data alignment signal deter- 
minations based on comparisons between destination and 
source addresses permit automatic replacement of virtual 
addresses with actual physical addresses to permit direct 
data transfer between devices. In one embodiment, the 
apparatus for translating comprises a processor; a page table 
having a mask register, a comparison value register, and a 
replacement value register; and a comparator coupled to the 
comparison value register and to the replacement value 
register. A programmable mask within the translation mask 
register is employed to partition a virtual address. A first 
subaddress comprises a subset of the bits of the first address 
and a second subaddress comprises remaining bits of the 
first address. The first subaddress is masked with a program- 
mable mask value in the translation mask register and is 
compared by the comparator with successive comparison 
values in the comparison value register until a match com- 
parison value is found. A programmable replacement value 
in the replacement value register corresponding to the match 
comparison value is concatenated with the second subad- 
dress. The programmable replacement values, which corre- 
spond to programable map windows, permit the program- 
mable mapping of virtual addresses into different 
predetermined regions of system virtual address space. 

16 Claims, 8 Drawing Sheets 
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METHOD AND APPARATUS FOR Thus, the same sequence of instructions from instruction 

TRANSLATING ADDRESSES USING MASK memory may be executed in each execution datapath. These 

AND REPLACEMENT VALUE REGISTERS same instructions may be applied to all execution datapaths 

by way of an instruction broadcast line and execution may 

This application is a continuation of U.S. patent appli- 5 be independent of the data processed in each execution 

cation Ser. No. 07/999,046, filed on Dec. 31, 1992, now datapath 

abandoned, which is ; a ^contmuatior.m-part of U.M>atent oblem systems such as prior ^ single . 

application Ser. No. 07/926,742, filed on Aug. 6 992, now architecture is in the area of input/ 

abandoned, which is a continuation-in-part of U.S. patent F . , . , 

appUcationSer.No.a7/782332,filedonOct24,199Lnow in output proassing Even m conventional single processor 

US Pat No 5361370 architecture a single block read instruction may take a long 

' ^ * period of time to process because memory blocks may 

FIELD OF THE INVENTION comprise a large amount of data in video image processing 

This invention relates to video processing devices and in applications. However, this problem is compounded when 

particular to accessing differing memories within such there is a block transfer for each enabled execution datapath 

devices. 15 c f the architecture and the datapaths must compete for 

BACKGROUND ART access to global memory. For example, arbitration overhead 

_ _ ... ... . may be very time consuming. This is further complicated 

Two types of video processors wmch may access bott ^ ^ ^ rojMunicatio * betmai me aea Z a ^ 

system and local memory are well known in the pnar art It ^ numbfir of ^ces to staa ^ ^ 

is well known in the pnor art to use multiple-instruction 0Ci , ... , , , 

multiple data (MIMD) systems in this manner. In a multiple- The alternative of providing each execution datapath with 

instruction multiple-data execution of an algorithm, each independent access to external memory is impractical for 

processor of the video signal processor may be assigned a semiconductor implementation. Furthermore, this alterna- 

different block of image data to transform. It is also known tive rcstricts ^ programming model so that data is not 

in the prior art to provide single-instruction multiple-data „ shared between datapaths. Thus, further ^efficiency results 

(SMD) architecture. Single-instruction, multiple-data is a due to the suspension of processmg of instructions until all 

restricted style of parallel processing lying somewhere block reads are completed. This may be seen in the 

between traditional sequential execution and multiple- discrete cosine transform image kernel of Table I: 
instruction multiple-data architecture having interconnected 

collections of independent processors. In the single- 30 TABLE I 

instruction, multiple-data model, each of the processing for (i = 0; i < numberofblocks; i = i + 4) { 

elements or datapaths of an array of processing elements or k= i + uns-DF-NUMBER; 

datapaths executes the same instruction in lock-step syn- read block(ongiTaLimageM,temg_bk)ck); 

chronism. Parallelism is obtained by having each datapath ixn , -^^tonp_.biock> 

, ^ ^ * , r j * -r write bkxJc(xfbnn unagepc), temp_block); 

perform the same operation on a different set of data. In ^ j. 

contrast to the multiple-instruction, multiple-data ' 
architecture, only one program must be developed and 

executed. The read_block and write_Jblock routines of the instruc- 

A conventional single-instruction multiple-data system tion sequence of Table I must be suspensive; ie., each 

may include a controller, a global memory and execution 40 routme n^s* De completed before the next operation in the 

datapaths, although data transfers between the datapaths and kernel is performed. For example, read_block fills temp_ 

system memory may be quite complex. A respective execu- block in execution unit memory with all of its local values, 

tion unit memory may be provided within each execution These local values are then used by DCT_block to perform 

datapath. Single-instruction multiple-data architecture per- a discrete cosine transform upon the data in temp_block. 

forms as a family of video signal processors united by a 45 Execution of the discrete cosine transform must wait for all 

single programming model. of the reads of the read_block command of all execution 

Single-instruction multiple-data architecture may be datapaths to be completed. Only then can the DCT_block 

scaled to an arbitrary number n of execution datapaths and write_block occur. Thus, by the ordering rules above, 

provided that all execution datapaths synchronously execute read_block must be completed before the wnte_block is 

the same instructions in parallel. In the optimum case, the 50 V* 0 **^ or the DCT_block is executed, 

throughput of single-instruction multiple-data architecture The requirements imposed by the ordering rules within 

may theoretically be n times the throughput of a single single-instruction multiple-data architecture result in the 

processor when the n execution datapaths operate synchro- sequentialization of memory transactions and processing, 

nously with each other. Thus, in the optimum case, the For example, a first memory read__block time segment of an 

execution time of an application may be reduced in direct 55 execution datapath must be completed before processing of 

proportion to the number n of execution datapaths provided DCT_block time segment may begin. Processing of the 

within single-instruction multiple-data architecture. DCT__block time segment must be completed before the 

However, because of overhead in the use of execution memory write__block time segment may begin. Only when 

datapaths, this optimum is never reached. the memory write_block time segment is complete can a 

Single-instruction multiple-data architecture works best 60 second memory read_block time segment begin. Thus, 

when executing an algorithm which repeats the same execution and access by a second execution datapath is then 

sequence of operations on several independent sets of highly sequentialized as described above for the first execution 

parallel data. For example, for a typical image transform in datapath. 

the field of video image processing, there are no data Similar requirements occur in high performance disk 

dependencies among the various block transforms. Each 65 input/output as well. In a typical disk input/output operation, 

block transform may be computed independently of the an application may require a transfer from disk while 

others. continuing to process. When the data from disk are actually 
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needed, the application may synchronize on the completion FIG. 5 shows a system virtual address space to processor 

of the transfer, often, such an application is designed to be address space page translation algorithm for the apparatus of 

a multibuffered program. In a multibuffered program, data FIG. 1. 

from one buffer is processed while the other buffer is being FIG. 6 shows a group of registers for controlling inter- 
filled or emptied by a concurrent disk transfer. In a well 5 nipts in the apparatus of FIG. 1. 

deagaedsystem, the input/output time is completely hidden. HG 7 shows me fr status register of me apparatus of 

If not, the execution core of single-instruction multiple-data ^ 

architecture is wait-stated until the data becomes available. ' * , , , , . t . 

This causes further degrading of the performance of the FIGS. 8A and 8B show two data formats permitted within 
single-instruction multiple-data architecture. 10 me Wanrtus of HG. 1. 

A system addressing some of these problems is taught in HG - 9 shows a memar y diagram illustrating possible 

"Architecture for Video Signal Processing", U.S. patent alignments of data transferred from the processor to the 

application Ser. No. 07/782332 filed Oct. 24, 1991 by system memory of the apparatus of FIG. 1. 

Sprague et al, now U.S. Pat. No. 5361J70. In the system of , „ ra ^ T , 0 ^ iWTTr ^ VT „ mrr , 
Spraiue, et al., a single-instruction, multiple-data image 15 

processing system is provided for more efficiently using INVENTION 

parallel datapaths when executing an instruction sequence Referring now to FIG. 1, there is shown a two-bus 

having conditionals and greatly improved external memory interface system 10 of the present invention having a system 

access. Each datapath of the Sprague etal. image processing processor 12. System processor 12 of two-bus interface 

system has an execution unit and a local memory. Access 20 system 10 is provided with two separate external intercon- 

between the execution unit and the local memory is by way nec t buses 14, 26. External interconnect buses 14, 26 of 

of one port of a dual-ported local memory. system processor 12 are connected respectively to local 

In mis system, all transfers between the local memory and memory interface 16 and system memory interface 28. Local 

the system memory take place using the second port of the memory interface 16 controls access to local memory 24 

dual-ported local memory. The transfers between system and 25 which may, for example, be a conventional dynamic random 

local memories are scheduled and controlled by a common access memory (DRAM) device, a VRAM, a static random 

unit called the block transfer controller. The block transfer access memory (SRAM) or other resource. System memory 

controller, along with the dedicated port of the dual-ported interface 28 controls access to system memory or resource 

local memory, permit each access to global memory by a 36. During a typical memory access cycle, system processor 

datapath to be overlapped with its instruction processing. 30 12 of interface system 10 may access either local memory 24 

This is useful in preventing stalling of the processor. Thus or system memory 36. 

the system of Sprague et aL solved several problems asso- order to control access to local memory 24 within 

dated with the single-instruction, multiple-data architecture. interface system 10, local memory interface 16 must receive 

However it did not solve all of the problems related to from local memory interconnect bus 14 signals suitable for 

transfer of data between the processor and both local and 35 controlling local memory address bus 18, local-memory data 

system memory, along with associated problems relating to 5us 20 and local memory control bus 22. Local memory 

interfaces and interrupts. buses 18, 20, 22, coupled to local memory interface 16, may 

SUMMARY OF THE INVENTION be conventional buses such as those normally required by a 

There is provided herein a method and apparatus for 40 conventional d > rnamic 1RAM \ 

translating a first address in a first address space to a second Likewise, system memory interface 28 must receive from 

address in a second address space. In one embodiment, the system memory interconnect bus 26 signals which are 

apparatus comprises a processor; a page table having a suitable to permit interface 28 to control the buses of system 

translation mask register, a comparison value register, and a memory 36. These buses include address bus 30, system 

replacement value register, and a comparator coupled to the 45 memory data bus 32 and system memory control bus 34. 

comparison value register and to the replacement value Buses ^ 32 » 34 316 conventional buses required for 

register. The first address comprises a first subaddress com- accessing a conventional system memory such as system 

prising a subset of the bits of the first address and a second memory 36. System memory interconnect bus 26 may be a 

subaddress comprising remaining bits of the first address. multiplexed multi-master bus. 

The first subaddress masked with a mask value in the mask so H will be understood that any number of devices may be 

register is compared by the comparator with successive coupled to system memory interconnect bus 26 within 

comparison values in the comparison value register until a two-bus interface system 10 by way of system memory 

match comparison value is found. A replacement value in the interface 28. For example, memory mapped device 44 may 

replacement value register corresponding to the match com- be coupled in this manner. Device 44 may be memory 

parison value is concatenated with the second subaddress to 55 mapped within interface system 10 only for the purpose of 

provide the second address. permitting access by system processor 12 and may have no 

BRIEF DESCRIPTION OF THE DRAWINGS memory of its own. Additionally an additional processor 40 

having its own local memory 42 may thus be coupled to 

FIG. 1 is a block diagram representation of an interface system memory interconnect bus 26. Using the system of the 

system of the present invention employing two buses far ^ present invention, memory mapped device 44 and local 

system and local memory interfaces. memory 42 of system memory 36 may be mapped into the 

FIG. 2 is a memory model of the apparatus of FIG. 1. virtual system address space of system processor 12 in 

FIG. 3 is a more detailed block diagram representation of addition to system memory 36. 

the two-bus interface system of FIG. 1. Because system memory interconnect bus 26 may be 

FIG. 4 is a processor memory address space to system 65 coupled between system processor 12 and various host 

virtual memory address space page translation algorithm for devices or peripheral devices such as processor 40 and 

the apparatus of FIG. 1. memory mapped device 44, system memory interconnect 



WEST 



5 

bus 26 may be understood to be a peripheral component 
interface bus. The primary use of system memory intercon- 
nect bus 26, i.e., peripheral component interface bus 26, is 
as a high-performance, low-latency path between system 
processor 12 and the various host devices or display/capture 
subsystems which may be coupled to two-bus interface 
system 10. As one of several devices coupled to peripheral 
component interface bus 26. system processor 12 may 
operate as either a master or as a slave in transactions 
involving peripheral component interface bus 26. 

Local memory interface 16 permits external frame buffer 
controllers (not shown) to communicate dynamic random 
access memory (DRAM) to sequential or serial access 
memory (SAM), SRAM and DRAM reads, as well as SAM, 
SRAM and DRAM to DRAM reads, and page read modes. 
System processor 12 receives transfer and split phase trans- 
fer read and write commands, and then performs the VRAM 
memory cycle requested. System processor 12 responds to 
VRAM transfer requests with an acknowledge pulse when 
the transfer is initiated, typically a small number of clock 
cycles after the VRAM transfer code is received by system 
processor 12. VRAM transfer commands have high access 
priority on local bus 14 compared to any other system 
processor 12 initiated local memory cycles. 

It will be understood that the split phase transfers are 
those wherein a double buffering technique is used to permit 
a second block of data to be read while a first block of data 
is still being processed within system processor 12. Split 
transfers by system processor 12 may be supported by 
external logic which monitors control lines of interface 
system 10 and schedule transfers accordingly. Split transfer 
capability within interface system 10 allows local memory 
16 to be packed more efficiently by eliminating the need to 
perform precisely timed mid scan line transfer cycles. 

Although interface system 10 is shown with a single local 
memory 24, it will be understood that system processor 12 
may support two banks (not shown) of memory on local bus 
16. Each of the local memory banks may have four base 
address pointers associated with it In addition, each address 
pointer may have a base address and a dedicated pitch 
register for doing next address calculations. The eight 
address pointers of the two banks of local memory 24 are all 
"write only" by microcode by way of system bus 14. 

Information transmitted by way of local memory interface 
16 is transmitted and received in command words. The nine 
bit word commands describe either of two formats which are 
described in more detail hereinbelow. Various bits in this 
command word indicate whether the transfer is read or write, 
which bank of memory is accessed, which of the four 
associated address pointers is used, and how to perform 
pitch offset calculations or register loads. Additionally, vari- 
ous bits in the command word may be used to indicate a 
horizontal line code, frame increment code, or to perform a 
base pointer copy. 

Referring now to FTG. 2, mere is shown memory model 
100. Memory model 100 represents the mapping of system 
virtual address space 130 and processor memory address 
space 150 of system processor 12 within two-bus interface 
system 10. In order to permit system processor 12 to 
selectively write a piece of data either (1) into local device 
24, or (2) into system memory 36, both resources (24, 36) 
are mapped into the memory space of system processor 12 
as represented by memory model 100. Note that the label 
"0V is merely a way to distinguish addresses in system 
virtual address space 130, which do not contain this prefix, 
from addresses in processor address space 150, as will be 
understood by those skilled in the art 
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Thus, system processor 12 may generate addresses any- 
where in a single, linear four gigabyte address space using 
microcode, direct memory access or block transfer control- 
ler operations. Addresses less than one gigabyte are mapped 

s into local memory 24 or into registers (not shown) within 
system processor 12. Addresses greater than one gigabyte 
are mapped, for example, into system memory 36, memory 
mapped device 44, are local memory 42. Accesses of system 
memory 36 by system processor 12 use programmable 

to processor-to-system space page translation table 120 in 
order to map two windows 122a, b into system virtual 
address space 130. References from system virtual address 
space 130 to the first gigabyte of processor memory address 
space 150 are mapped by programmable system-to- 

15 processor space page translation table 140 in order to map 
four windows 142a-~d and into local memory 24. 

System virtual address space 130 represented by memory 
model 100 may be large. For example, system address space 
130 may be four gigabytes, lii the preferred embodiment of 

20 two-bus interface system 10, system address space 130 may 
be mapped into processor memory address space 150 
wherein address space 150 has three processor address 
partitions 102, 104, 106. Within processor memory address 
space 150, addresses between zero and one gigabyte minus 

25 four kilobytes may be mapped into first processor memory 
partition 102. Memory locations corresponding to first pro- 
cessor memory partition 102 may be located in a physical 
local memory bank such as local memory 24 which is 
accessed by way of local memory interface 16. 

30 Second processor memory partition 104, having a size of 
approximately four kilobytes, may also be reserved within 
interface system 10. Four kilobyte processor memory par- 
tition 104 may preferably be located immediately above first 
processor memory partition 102 in the last four kilobytes of 

35 the first gigabyte of virtual address space 130 of system 
processor 12. Second processor memory partition 104 may 
most advantageously be mapped into internal registers (not 
shown) within system processor 12 of interface system 10. 
The third processor memory partition within two-bus inter- 

40 face system 10. processor memory partition 106, may have 
addresses from one gigabyte to four gigabytes. Processor 
memory partition 106 may reside, for example, in system 
memory 36 accessed by way of system memory interface 28. 

45 The preferred embodiment of the invention two-bus inter- 
face system 10 is provided with virtual addresses of thirty- 
two bits. These addresses are mapped within memory model 
100 by programmable page translation tables 120, 140. 
Programmable page translation tables 120, 140 may be 

^ located within the lower one gigabyte of processor memory 
address space 150. 

In the page translation method of the present invention, 
programmable map windows 122a, b are defined by a user 
in processor-to-system space page translation table 120. In 

55 this embodiment, windows 122a, b are mapped into pro- 
grammably selectable regions of system virtual address 
space 130. A user of interface system 10 may easily change 
the locations of address space 130 into which system pro- 
cessor 12 generated addresses are mapped by means of 

50 programmable map windows 122a, b by changing the pro- 
gramming of map windows 122a, b within processor-to- 
system space page translation table 120. 

Each programmable map window 122a, b of page trans- 
lation table 120 may be of variable size, from a minimum of 

65 four kilobytes to a maximum of one gigabyte. The size of 
map windows 122a, b can be increased in increments of 2*, 
where x is a positive integer. Map windows 122a, b must not 
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overlap within page translation table 120 or exceed the three cesser 300 simultaneously applies the same instruction to 

gigabytes of physical memory available within system vir- every execution data-path 35&*-n by way of broadcast 

tual address space 130. If an address generated by system instruction line 356. The instructions applied by sequence 

processor 12 does not match an entry in processor-to-system controller 352 may, for example, be previously stored in 

space page translation table 120, it is ignored and a page 5 system memory 36. The instructions received by sequence 

table fault is issued- A page table fault issued in this manner controller 352 are applied to sequence controller 352 by way 

within two-bus interface system 10 terminates current opera- 0 f ^ 340 

tions of block transfer controller 368, as described . ™™ iaa ««. 

hereinbelow, terminates all scheduled direct memory access .Wthin image I^ssorJOO, cond^onal ««^onper- 

operations, and generates an interrupt of system processor ,„ c f h . data P alh *> ««» te ^1? * 

^2 10 particular issued instruction depending on the state or the 

I -v M ~c f L n w „^„ t local datapath condition flag. Hardware execution masks 

In the page translation method or the present invention „_ r ... .... ^ , . M 

JT ui ~ «j -\a*% j K „ » .™ 359<wt, residing within execution units 360o-w of image 

programmable map windows VX2a-d are denned by a user „ A 6 , 4 . . . . . . A *^JL 

. *^ 4 r *_ 1 * « iyiA t processor 300, permit individual datapaths 35&wi to turn 

in system-to-processor space page translation table 140. In F _ . 1 * . . x ""~*' . ^ ^. - _ 

#*.;/««i™«JL«* ™«« „ • j,5 Tr f , ^ . t _ off execution of a sequence of issued instructions far an 

this embodiment, map windows lQZa-a are mapped into , . . , ^ _ ^_ , . , 

. . - . j , 15 arbitrary period 01 time. These two mechanisms decrease the 

memory partitions 102, 104 of processor memory address / V " ZT m,. ^ .^V . , 

space 150 from system virtual address space 130. The ™™ nt of wait stating or idling of exeamon datapaths 

locations of address space 150 into which system generated * 0mi mth ™ ^& e T K 
addresses . mapped may be readily changed * 

meprograrmmngofprogranimablemapwmdowsl42fl-^of UA ^ 

system-to-processor space page translation table 140. Control over whether an instruction issued by instruction 

Each programmable window 142^ of system-to- sequence controller 352 is executed or ignored by an indi- 

processoi space page translation table 140 may be of vari- vidual execution datapath 358a-* is required for "oala- 

able size, from a minimum of four kilobytes up to a dependent computation (which includes different rigo- 

maximum of one gigabyte, in increments of 2*, where x is _ ?hms for different data as a function of the nature of the 

a positive integer. Windows 142a-<f must not overlap within data) a a smgle-instruction multmle-data architecture such 

page translation table 140 or exceed the one gigabyte of as architecture of image processor 300. It is required 

physical memory corresponding to processor memory par- becau f *«* execution 3 f*°-* may have a differ- 

titions 102, 104. If a system generated address does not ent value wnen a *** » performed as part of a conditional 

match an entry in system-to-processor space page translation - 0 branch. Thus, each execution datapath 35&wt within image 

table 140, it is ignored and a page table fault is issued. A processor 300 of the present invention is provided with 

page table fault in two-bus interface system 10 terrninates individual datapath execution masks, 

current operations of block transfer controller 368, as H is equally important to control the sequence of instruc- 

described hereinbelow, terminates all scheduled direct tions provided by sequence controller 352 to execution 

memory access operations, and generates an interrupt of 35 datapaths 358o-ft by way of broadcast instruction line 356. 

system processor 12. This control is essential for loops and may also be used to 

Referring now to HG. 3, there is shown two-bus interface optimize data-dependent execution wherein no execution 

system 10 including a block diagram representation of datapath 358o-n is required to execute a conditional 

single-instruction multiple-data architecture image proces- sequence of instructions. 

sor 300. While here image processor 300 is used to perform 40 For the purpose of executing a conditional branch within 

the functions of system processor 12, it will be understood image processing architecture 300, each datapath 358a-n 

that processors other than single-instruction multiple-data tests the condition of a conditional branch and independently 

architecture image processor 300 may be used within two- sets its own flags according to its own local determination, 

bus interface system 10. A processor such as image proces- Signals representative of these flags are applied by each 

sor 300 is taught in parent U.S. patent application Ser. No. 45 execution datapath 358a-ro to instruction sequence control- 

07/782332, filed by Sprague et aL on Oct 24, 1991, now ler 352 by way of flag lines 3S4a^u 

U.S. Pat No. 5361370, which is incorporated by reference Rather man automatically wait-stating all execution data- 

herein. paths 358o-n during a conditional branch, single-instruction 

Each execution datapath 358o-n of single-instruction multiple-data architecture 300 of the present invention uses 

multiple-data image processor 300 is provided with a 50 the flag signals of flag lines 354 to apply a consensus rule, 

respective execution unit 360a-n and execution unit In the consensus rule of image processor 300, sequence 

memory 362a-n. Each of the execution units 360o-n is controller 352 does not apply a conditionally executed 

coupled to its respective execution unit memory 362a-/i by instruction sequence to broadcast instruction line 356 unless 

way of a respective port 361a-n and to local memory 24 and flag lines 354 signal controller 352 that every execution 

system memory 36 by way of a respective port 363o-n. 55 datapath 358a-n requires the instruction sequence. This 

Ports 361/z-a and parts 363o-ff, together, provide each prevents the inefficiency which results when some execution 

execution datapath 55$a-n with a dual port architecture to datapaths 358a-/i are wait-stated for the duration of a 

permit each execution unit 360a-n to access its respective sequence which is not executed by some of the datapaths 

execution unit memory 362awi at the same time that data is 358o-n. 

being transferred between execution unit memories 362a-n go Both mechanisms, conditional execution and execution 

and local memory interface 16 or system memory interface masks, may be used to implement the conditional execution 

28. It will be understood that within the dual port architec- within image processor 300 when some but not all datapaths 

ture of image processor 300, no execution unit 360<wi may 358o-n require it Of these two mechanisms, execution 

directly access any execution unit memory 362a-n except its masks EM are more general (Le. , affects the entire datapath), 

own. 65 The execution mask flag is appended to the normal set of 

During execution of instructions, instruction sequence local arithmetic condition code flags within each execution 

controller 352 of single-instruction multiple-data image pro- unit 36fcwz. When an execution mask flag EM is set within 
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an execution unit 360o-n and sequence controller 352 handled by autonomous synchronous block transfer control- 
applies the conditional execution sequence to broadcast ler 368. It will be understood that these are the operations of 
instruction line 356, each execution unit 360o-/> having its peripheral component interface bus 26 of the present inven- 
execution mask flag set ignores the instructions. tion. 

The only exceptions to instructions being ignored by 5 Block transfer controller 368 within single-instruction 

execution datapath 358a-n within image processor 300 multiple-data image processor 300 allows the transfer of 

when an execution mask flag EM is set are (1) an instruction two-dimensional arrays which are conformably displaced 

which restores the state of the previous execution mask flag, This allows a subblock of a large image to be copied in a 

and (2) those instructions which unconditionally modify the single block operation for example. In general, using source 

execution mask flag EM. These instructions are executed by 10 and destination bit maps, conformably displaced blocks may 

all execution units 358a-n even if the execution mask flag be transferred even though they do not have the same aspect 

EM within a datapath dSHa-n is set. Thus, if the execution ratio or alignment in physical memory, 

mask flag EM is set in a selected execution unit 360a-o, The specification for a block transfer operation initiated 

instructions from instruction sequence controller 352 are DV a program within image processor 300 is a set of lists of 

ignored by the selected execution unit 360a-/i. It is then 15 individual block transfers. Each enabled execution datapath 

possible to encode a conditional thresholding program frag- 358a-n builds a list of block transfer commands in its 

ment within single-instruction multiple-data architecture execution unit memory 362a-n. A single block transfer 

image processor 300 using execution masks EM. initiate instruction eventually leads to the processing of all 

As previously described, each execution datapath 358o-« block transfer commands from the lists of every enabled 
within single-instruction multiple-data image processor 300 20 execution datapath 358a-n . In addition, up to two sets of 
is equipped with an execution unit memory 362o-/». Each lists of block transfers may be pending at any time, 
execution unit 360a-n directly accesses its own execution Referring now to FIG. 4, there is shown processor 
unit memory 362a-n by way of a respective part 36la~n of memory address space to system virtual address space page 
image processor 300. Each port 361a-n is provided with translation algorithm 400. Processor to system address space 
both an A port and a B port Different signals may be 23 page translation algorithm 400 defines how two-bus inter- 
transmitted between each execution unit 360a-n and its face system 10 translates thirty-two bit virtual address 402 
execution unit memory 362a-rc simultaneously by way of generated by system processor 12 into a thirty-two bit virtual 
the A and B ports under the control of the program being address which is effective to access system virtual address 
executed within execution units 360o-n. It will be under- ^ space 130. 

stood that this transfer by way of ports 361o-n is distin- During the address phase of a bus cycle, thirty-two bit 

guished from transfers by way of ports 363a-/i which are virtual address 402 from block transfer controller 368 is 

under the control of block transfer controller 368. latched internally within image processor 300. Virtual 

It will be understood that this type of access to local address 402 may be generated by microcode within image 

execution unit memories 362a-n by execution units 360a-n 35 processor 300 or by a direct memory access. If virtual 

involves writing of pointers only. Thus, these operations are address 402 is in the Iowa one gigabyte of processor 

not actually random accessing of execution unit memories memory address space 150, as determined by comparison 

362o-n. Block transfer controller 368 permits split phase determination 410, the access is to local memory 24 and no 

transactions. These split phase transactions are completely address translation is required. If the address is greater than 

independent of instruction sequence controller 352. Thus, ^ one gigabyte, as determined by comparison determination 

block transfer controller 368 operates as a separate instruc- 410, it must be translated using page translation algorithm 

tion engine or controller not directly controlled by instruc- 400. 

tion sequence controller 352. This allows efficient access to translation of system processor 12 generates 

memory, for an instruction cache, for example, as further addresses greater than one gigabyte as follows. In block 406 

discussed below. Therefore, block transfer controller 368 45 0 f pag e translation algorithm 400, a programmable mask 

minimiz es idling or wait stating of execution datapaths residing within a selected twenty bit processor-to-system 

35$a-n while waiting for instructions. address translation mask register 43<ka; b is used to partition 

It will be understood by those skilled in the art that thirty-two bit virtual address 402. Two or more program- 
conventional image processing systems usually provide pro- mable masks within mask registers 430a, b are provided for 
cessor consistency in that instructions are executed in the 50 this purpose within programmable processor-to-system 
order that they are requested from memory. It will also be space page translation table 120. Masks within mask regis- 
understood that single-instruction multiple-data image pro- ters 430a, b, denoted as Entry 0 and Entry 1 within page 
cessor 300 of the present invention is provided with weak translation table 120, are effective to partition virtual address 
processor consistency because block transfer controller 368. 402 into a variable sized virtual page address 412 of zero to 
functioning as a separate instruction engine or controller, 55 twenty bits, and a variable sized offset 420 of twelve to 
can cause certain memory read requests to pass other thirty-two bits. This allows windows with sizes of four 
memory read requests. kilobytes to one gigabyte to be opened. The sizes of the 

Within single-instruction multiple-data architecture 300 windows are powers of 2*. where x is a positive integer, 

there is provided a method to more efficiently read blocks of A comparison between variable page address 412, formed 

data from system memory 36 into execution unit memories 60 by a selected mask within mask registers 430a, b, and a 

362a-n, and operate on the data within execution unit corresponding comparison value within comparison regis- 

memories 362o-n by way of lines 340, 342, 344. In order to ters 434a, b is performed by page translation algorithm 400 

accomplish these more efficient block read and block write at the page match comparison determination 414. Compari- 

operations, single-instruction multiple-data image processor son values of registers 434a, b within page translation table 
300 is provided with block transfer instructions and block 65 120 are denoted as Entry 0 and Entry 1. When page address 

transfer architecture. These input/output operations within 412 results from partition by the mask residing in mask 

single-instruction multiple-data image processor 300 are register 430a, the comparison is performed using the com- 
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parison value residing in register 434a. When virtual page 

address 412 results from partition by the mask residing in TABLE I 
mask register 430/?, the comparison is performed using the 

. „ . . . /* For all bits in each register */ 

comparison value residing in comparison register 4346. The far (j=0; jot; i++) 

comparisons are made between page address 412 and the s /* Determine page size andoffcet size 

contents of successive comparison registers 434a, b until if (ADDRESS<i gigabyte) then 

either a match is found or the entire contents of comparison physaddr = address; 

registers 434a, b have been compared with page address j. Break 

412. /* Determine page size anH ofEiset size*/ 

If a bit in a selected mask register 430a, b is one. a bit in w JS^^^^^^SS^L 

corresponding comparison register 434* b is compared with ^^oo^Sof 

the corresponding bit of virtual page address 412. If the bit for (i=i2; i<*3i; H+) { 

in the selected mask register 430a, b is zero, the correspond- /* Compare the compare register bit and the 

ing bit of virtual page address 412 is not compared to a bit corresponding virtual page address bit */ 

in the corresponding comparison register 434a, b. A match « S^^t^K^ 

is found when all of the compared bits of virtual page } 

address 412 match the corresponding bits of the comparison /* If match is found, then assemble the physical 

values of registers 434a, b. ^^^^JL L , 

. . , . if (TEMP = OxffSfooo) then { 

If a match is found, zero to twenty bit replacement value physaddr = (BSTLMRjj^i.-O]) or offset, 

418 is used to replace page address 418 and, thereby, at least 20 match-h-; 
a portion of the upper zero to twenty bits (31:12) of original 

virtual address 402 It will be understood that the notation (f'S^Sj^^nrysADDR [1*] = 

(3 1: 12) indicates a twenty bit portion of a thirty-two bit word DATAiype cycletype = bstlmc [j,9:6] 

which extends from bit thirty-one to bit twelve. } 

If the match is between page address412 and the value of 25 if ( MArcH =°) fmjlt 

comparison register 434a, the programmable replacement clear all requests 

value of replacement register 438a is used to provide } 

replacement value 418. If a match is found between page — - ^ — - _— 

address 412 !and the value of coniparison register Wfc,Ae M Rrf Q0W tQ nQ $ ^ fa ^own 

pro^ammable replacement value of replacement re^er processor sjace page translation algorithm 500 for translat- 

438* is used to provide replacement value 418 In either ^ ^ ^ 'generated virtual addresses from system 

case, virtual page address 412 is replaced with a value address space m ^ addresses within local 

permitting access to an actual physical location within mem ory partitions 102, 104 of processor memory address 

address space 130. 35 space 150. Thus, page translation algorithm 500 substan- 

It will be understood that the programmable replacement tially performs the operations of system-to-processor space 
values of replacement registers 438a, b correspond to pro- translation table 140 of memory model 100. 
grammable windows 122a, b of processor-to-system space When implementing page translation algorithm 500, two- 
page translation table 120. Modification of the values of bus interface system 10 causes thirty-two bit virtual address 
replacement registers 438a, b permits the programmable ^ 504 to be latched within image processor 300. In masking 
mapping of virtual address 402 generated by image proces- block 506 of page translation algorithm 500, a selected 
sor 300 into differing predetermined regions of system programmable twenty bit translation mask residing within 
virtual address space 130. translation ma sir registers 514a-d is applied to latched 

Based upon the selected translation mask 430a, b t virtual address 504. The selected translation mask is effec- 

between thirty-two and twelve bits of variable sized offset 45 tive to partition virtual address 504 into variable-sized 

420 are partitioned from virtual address 402 for concatena- virtual page address 510 and variable sized offset 536. The 

tion with replacement value 418, as previously described; in programmable translation masks within translation mask 

this manner, variable sized offset 20 is concatenated with registers 514o-J of system-to-processor address space page 

replacement value 418 to form physical address 422 within translation table 140 are denoted as Entries 0, 1, 2, 3. 

one of the regions of system virtual address space 130 so In this manner, virtual address 504 is partitioned into 

mapped by map windows 122a, b. Address 422 is the actual variable sized virtual page address 510 of zero to twenty bits 

physical address of the location accessed by system proces- and variable sized offset 536 within page translation algo- 

sor 12. Physical address 422 may undergo a further trans- rithm 500 as previously described. Variable sized offset 536 

lation which converts physical address 422 to a row and is between thirty-two and twelve bits. This allows window 

column address needed to access a selected system device in 55 sizes of four kilobytes to one gigabyte, in powers of 2*, 

accordance with datatype bits 416 as described in more where x is a positive integer, to be opened within interface 

detail hereinbelow. system 10. 

If no match with page address 412 is found, page address A selected comparison value within comparison registers 

412 is not used, and no access of buses 14, 26 is performed S26a-d is compared with virtual page address 510 at page 

in response to virtual address 402. If page address 412 60 match comparison 522. If a bit in the selected translation 

matches more than one comparison value in comparison mask of translation mask registers 5140-0* has a value of 

registers 434a, b f the first comparison value which matches one, the corresponding bit in comparison registers 526o-d is 

is used to determine a selected replacement register 438a, b compared with the corresponding bit in virtual page address 

and thereby to determine replacement value 418. A program 510. If the bit in the selected translation mask register 

implementing the page translation of algorithm 400 is shown 65 514o-d has a value of zero, the corresponding bit of virtual 

in Table 1. The page translation program of Table 1 is in a page address 510 is not compared to the bit in comparison 

form which will be understood by those skilled in the art. registers 526a-d. A match is found when all of the compared 
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bits of virtual page address 510 match the corresponding bits asserted, a general interrupt may be asserted, or a write to an 

in corresponding comparison register 526a-J. interrupt register may be performed. These three types of 

If a match is found within page translation algorithm 500 interrupts within interface system 10 are all controlled by 

at page match comparison 522, a zero to twenty bit replace- interrupt control registers 602-612 within interrupt register 

ment value in a replacement register 540a-d is selected The 5 group 600. 

replacement value selected in this manner replaces virtual Interrupt control registers 602-612 are herein grouped 
page address 510 and, thereby, the upper zero to twenty bits together as interrupt register group 600 for convenience in 
of original virtual address 504. In this manner the virtual describing the operation of the interrupt operations of two- 
addresses generated by devices within system virtual bus interface system 10 of the present invention. However, 
address space 130 are replaced with bits representing the to it will be understood by those skilled in the art that interrupt 
actual address to be accessed within processor memory control registers 602-612 may physically reside in differing 
address space 150. Based upon the translation mask value of locations within two-bus interface system 10. 
selected mask register 514o-J, between thirty-two bits and Bus interface status register 602 of interrupt register 
twelve bits of variable-sized offset 536 are concatenated to group 600 includes register fields 614a-*. Transfer register 
replacement value 534 in order to form physical address 15 fi e id £i4a of interface status register 602 is used to indicate 
538. an attempted VRAM transfer cycle to a bank of memory 

This concatenation of replacement value 534 and containing DRAM. Page translation fault field 6146 may be 
variable-sized offset 536 produces physical address 538 at set to a value of one to indicate the occurrence of a 
which the addressed data may be accessed within processor translation error during a system processor 12 initiated 
memory partitions 102. 104. It wHl be understood that this 20 transfer to system memory 36 by way of peripheral corn- 
process permits processor 40 (see FIG. 1). for example, to ponent interface bus 26 and system memory interface 28. 
access local memory 24 without interrupting image proces- Direct memory access (DMA) register fields 614c d may 
sor 300 (see FIG. 3). The access of local memory 24 in this indicate that the start of a direct memory access transaction 
case may be by way of block transfer controller 368 which to system memory 36 is not directed to a valid map window 
handles the access independently of execution units 360a-ru 25 122a, b within page translation table 120. 

If no match is found, at page match comparison 522 a Direct memory access transfer completion information, as 

page table fault interrupt is generated within two-bus inter- well as direct memory access or page table fault information, 

face system 10 of the present invention. In response to the are also stored in bus interface status register 602. This 

page table fault, virtual address 504 is not used by image information is used in connection with bus interface status 

processor 300 and all posted operations of block transfer mask register 604, communicated to instruction sequence 

controller 368 and any scheduled direct memory access controller 352 and latched into trap status register 700 which 

operations are terminated. New requests are accepted within is described hereinbelow. For this purpose, register fields 

image processor 300, for example, to load an interrupt 614c; / of interface status register 602 may indicate that a 

service routine or a trap. ^ direct memory access operation has completed a scheduled 

If virtual page address 510 matches the contents of more transaction, 
than one comparison register 526a-<i, the first page table Processor interrupt field 614g indicates that system pro- 
entry for which a match is found is used to select replace- cesser 12 itself has initiated an interrupt within two-bus 
ment value 534 for concatenation with variable sized offset interface system 10. The signal of processor interrupt field 
536. A program implementing the page translation of algo- ^ ^\4 g c f interface status register 602 may be logically 
rithm 500 is shown in Table 2. The program of Table 2 is AND'ed with interrupt mask field 616# of bus interface 
written in a form which will be understood by those skilled status mask register 604. The structure and function of bus 
in the art. interface status mask register 604 are described hereinbelow. 

The result of this logical AND operation may be outputted 

TABLE 2 45 from system processor 12 to provide an external signal 

;*t,„ ii vu • u ^ */ indicating that system processor 12 has initiated an interrupt 

/* For all bits in each register */ & . J *^ , ... _ . - 7j 

for 0=O; J<=3y-H-) according to the corresponding mask bit Register fields 

/* Determine page size and offset size*/ 6X4h, i may be used to monitor overall system processor 12 

vpageaddr = (BSiLMEfj) and address) performance as well as input/output operations for purposes 

tttSSSSg**""'""" so of synchronization. 

for (i=i2; i«c=3i;i++) { Interface status mask register 604 contains register fields 

/♦Compare the compare register bit and the 616a— i. Register fields 616a, b may contain, for example. 

TEMP[i] = i else TEMP[I] = 0; fault - Register fields 616c, d contain masks for direct 

} 55 memory access error interrupts. Register fields 616e, / 

/* if match is found, then assemble the physical contain masks for direct memory access interrupts. Register 

. . field 616$ contains the mask bit for processor interrupt field 

if (TEMP=OxfffffOOO) then { „ . . * , ^ . • . . i_ j 

PHYSADDR = (BSTLMRO,3i:0]) or OFFSET, 614g of interface status register 602 as previously described. 

break; In a similar manner, register fields 61Sa-e of bus system 

} 60 interface fault register 606 may contain indications of a 

* variety of system interface bus error conditions. 

Register fields 62Qa-e of bus system interface fault mask 

Referring now to FIG. 6, there is shown interrupt register register 608 may contain system interface bus error condi- 

group 600 including interrupt control registers 602-612 for tion mask bits for a variety of sources. Bus interface 

controlling errors and interrupts within two-bus interface 65 interrupt register 610 may have thirty-two bits which serve 

system 10. Two-bus interface system 10 may be interrupted as system interrupt registers. These bits within bus interface 

in at least three ways. A non-maskable interrupt may be interrupt register 610 are reset when register 610 is read Bus 
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cycle errors are stored in bus system interface fault mask ously described These routines, in general, are not local to 

register 608. Register 60*8 is used in connection with bus an instruction cache. Under these conditions, the instruction 

system interface fault register 606 and its content comma- about to be executed is aborted and execution is transferred 

nicated to instruction sequence controller 352 and latched to a programmer-denned trap handler located at a predeter- 

into trap status register 700. 5 mined address in local memory 24. The trap handler iden- 

Thirty-two bit bus interface interrupt register 610 may tines the source of the trap and executes the required trap 

reside in system memory interface 28 in order to be acces- service routine. Once the trap handler has serviced a trap, the 

sible to devices in interface system 10 such as system aborted instruction is restarted and normal execution is 

memory 36, processor 40 and memory mapped device 44. A resumed. 

read of bus interface mterrupt register 610 by image pro- w mechanisms that may cause invocation 

cessor 300 resets all of the hits of interrupt register 610 to . A " IT - « " . . /- , - A 

zero. Bus interface interrupt mask register 612 provides f * e handler withm tw^ interface system 10. 
thirty-two mask bits within interrupt register group 600. Interface system 10 provides software trap instructions 
Each bit of thirty-two bit bus interface interrupt mask which ^ be used to set user defined breakpoint conditions, 
register 612 corresponds to and provides a mask for a bit in External conditions such as interrupts, non-maskable inter- 
bus interface interrupt register 610. 15 ^ ™ d assertion of a reset also generate traps. Finally, 
Referring now to FIG. 7, there is shown trap status anomalies m me nor^ F^smg sequence, such as page 
register 700 within single-instruction multiple-data architec- m * faults > mvoke me han(Uer - 
tare image processor 300. The errors and exception condi- A trap may be generated using an external interrupt pin, 
tions in bus interface status register 602 and bus system „ when, for example, processor 40 (see FIG. 1) writes to 
interrupt fault register 606 may be represented by assigned register 610. As previously described, this pin may be level 
trap bits within trap status register 700. TTius, the status of sensitive. Using a level to indicate an interrupt allows 
various traps of two-line interface system 10 may be deter- several interrupt sources to vector to the same trap handling 
mined by image processor 300 by performing logical opera- routine. Image processor 300 may interrogate each device 
tions upon selected bits of trap status register 700. Based throughout system virtual address space 130 or generate an 
upon mese determinations, execution may be directed to one interrupt acknowledge cycle to determine which is the 
or more of a number of routines adapted for handling requesting device. 

specific errors and exception conditions. These special ser- Additionally, there are software initiated traps. For 

vice routines are described in more detail hereinbelow. example, a single set trap occurs when a predetermined bit 

For this purpose, a signal is sent to sequence controller ^ of the trap status register 700 is set to one by the microcode 

352 and a signal is latched within trap status register 700 of image processor 300. In a similar manner, branch instruc- 

when predetermined interrupt bits within interrupt register tion traps, reserved instruction traps, and instruction traps 

group 600 are active and unmasked. The interrupt trap niay all be initiated by the execution of an instruction. The 

handler may then read bus interface interrupt register 610 to reserved instruction trap may be serviced by image proces- 

determine which device generated the interrupt Trap status 35 sor 300 causing the trapped instruction to be read from 

register 700 may be loaded, stored, tested and modified by address space 130 into execution unit memory 362a-* 

execution unit 360a of single-instruction multiple-data where it may be analyzed and emulated under program 

architecture image processor 300 under program control in control. All of these traps are maskable and the instruction 

the preferred embodiment of two-bus interface system 10. trap is always enabled. 

The modification of trap status register 700 by image ^ Additionally, when a real time counter reaches zero, the 

processor 300 may be a bit-by-bit clear performed by an frame counter is incremented by one, and when direct 

interrupt service routine. memory access transfers are completed, a maskable trap is 

Examples of the uses of the bits of trap status register 700 initiated. Also, when there is no match in page translation 

are as follows. Branch instruction trap bit 702a may be tables 120, 140 during an input/output access, a system bus 

generated when an unconditional jump is executed or when 45 cycle error occurs or a bus interface fault trap is generated 

a conditional branch is taken within image processor 300. and a trap is initiated. 

Real time counter bit 702& nay be generated when a real Information as to the state of image processor 300 which 

time counter reaches zero. Interrupt pin trap bit 702c may be must be saved by the trap handler is application dependent 

generated by a high level interrupt on an external interrupt For example, if an arithmetic logic unit (ALU) of an 

pin and interrupt register trap 702a* may be generated when 50 execution unit 360o-n is used by the trap handler then, in 

a write is performed to interrupt input register 610 in order general, the contents of the arithmetic logic unit (ALU) must 

to permit multiple interrupts. be stored along with the contents of various other registers 

Interrupt pin 702e provides an advantageous feature which are passed through the ALU. When execution returns 

within trap status register 700 because it is a trap type bit from the trap handler, the process is reversed. The ALU is 

which is level sensitive. Level sensitive bit 7Q2e within trap 55 loaded by passing the data through one of the other registers, 

status register 700 permits several different interrupt sources Thereafter, the contents of any latches which were saved are 

to be vectored onto. Lc, combined to produce, a single trap restored. Thus, the extent to which information as to the state 

bit The remaining bits of trap status register 700 may of image processor 300 must be saved will vary with the 

capture various trap requests and hold them active in the type of trap being serviced. 

manner of a conventional latch. Thus, an interrupt may be go Once a trap condition has been met within image proces- 

detected by image processor 300 after the source which sor 300, a bit in trap status register 700 is set. one exception 

generated the request is inactive. The trap handler may to this is the reset wherein a bit in trap status register 700 is 

execute an interrupt acknowledge cycle by sending the not set Note that the reset trap must be recognized by the 

appropriate cycle code in processor-to-system space page trap handler because no unmasked trap types are active in 

translation table 120. 65 trap status register 700. Trap status register 700 contains a 

Exception conditions detected in programs or external bit for each of the different traps as well as a global enable 

interrupts usually require special service routines as previ- bit (not shown). There is also a mask bit for each trap in a 
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trap mask register, which allows each of the traps of register it will be understood that data within two-bus interface 

700 to be individually masked out system 10 must be arranged in different ways depending on 

Except for the non-maskable traps such as reset and the whether it is in data format 800 or data format 850. 

non-maskable interrupt, the initiation of a trap within two- Two-bus interface system 10 always operates internally 

bus interface system 10 depends upon two conditions: (1) 5 on data which is in data format 850. Furthermore, interface 

the corresponding mask bit for the trap in the trap mask svste * 1° ***** ^ j£* mem01 ? * 13 

register mnstbe set to one, and (2) the traps of trap status stored *»■ Bawe ^\ m t some v f^ m 

re^sterTOOmusthavebeenenabledby settmg the enable bit configurations, sm£e-ins*uction ^^^^ 

in register 700. Hie exception to this is the kstruction trap ™** P™**™ 300 * 0m S ^ <™ 

v!l • V « Z ruTZu ui L7 F in in system address space 130, for example processor 40 and 

which is not affected by the enable bit 10 ^ memQ ^ ^ ( £ mQ 1} w ^^ y usc ^ which is 

Trap handler invocation, excluding reset and the non- in ^ format 800 The me thod of the present invention for 

maskable interrupt, is performed as follows. When an implementing such accesses within two-bus interface sys- 

unmasked trap is initiated, the enable bit of trap status tern 10 assumes: (1) that incompatibility between a device 

register 700 is set to zero to disable further traps during using data format 800 and a device using data format 850 

servicing. The program counter address at which the pro- 15 arises when data in data format 800 is interpreted by a device 

gram will later be restarted is stored. The initial trap handler using data format 850, (2) that incompatibility between such 

instruction is fetched and multiplier registers of execution devices arises when data in data format 850 is interpreted by 

units 360a-n are copied into a shadow register within each a device using data format 800, and (3) that ^compatibility 

respective execution unit 36Qa-n. does not arise when data in either data format 800 or in data 

The reset and non-maskable interrupt trap handler invo- 20 format 850 are merely moved from one location to another 

cation is distinguished from the other traps in several within two-bus interface system 10. 

respects. Reset is not stored in trap status register 700. All of For example, there are mechanisms in the translation from 

the tags in an instruction cache are invalidated during these system virtual address space 130 to processor memory 

two invocations and various internal control states are addrcss «P« 150 tc ^control the ^ forir^of the data stored m 

initialized throughout image processor 300. 25 system memory 36. When data is stored by irnageprocessor 

■~ " 1 v * ^ ^ *~ 300 by way of programmable map windows 122c, b into 

^is^eofmeofcerfra^ ^ ^ ^ ess S p aC e I30, system 10 provides 
is invoked, the enable bit of trap mask register is set to zero bits 416 the datatype information of 
to disable traps. The program counter address at which the ^ traction coincident with the data. These datatype bits 
program will later be restarted is stored. The initial trap ^ 415 ^ stored in comparison registers 434a, b within page 
handler instruction is fetched. The multiplier register is table 120 and are provided during execution of page trans- 
copied into a shadow register in each execution unit 360o-n. \z&oil algorithm 400. These bits of comparison registers 

Prior to exiting the trap handler, a user of interface system 434a, b may indicate whether data in the referenced page is 

10 must insure that the state of image processor 300 is in data format 800 or data format 850. Datatype information 

restored. The extent of the restoration required varies 35 bits 416 are concatenated with replacement value 418 and 

depending on the application. Once the state of image offset 420 to form address 422. 

processor 300 is restored, the return from trap instruction is The datatype information represented by datatype bits 416 

executed. The return from trap instruction returns program ^ obtained from two sources in addresses generated by 

execution of execution units 360<wi of image processor 300 system processor 12. For microcode generated accesses, 

to the address stored as the address of the instruction which ^ SUCD ^ operations performed by block transfer controller 

was next in line for execution when the trap handler was 368) me datatype information is obtained from the block 

entered. It also sets the enable bit of trap status register 700 template or the scalar type, as programmed by the user. For 

to one in order to reenable the traps, and restores the state of memory access, the datatype information is provided 

the multiplier. from the direct memory access template. In all configura- 

The following steps illustrate the sequence of events in 45 ti ons 0 f two-bus interface system 10 which share data 

the handling of a trap. between image processor 300 and devices using data format 

800, datatype information is included along with all data 

Enter TVau transfers. The datatype information may be used by external 

save Siac state to external memory lo & c ( not shown ) to perform conversions between data 

check for what kind of interrupt 50 formats 800, 850. This logic may be included in any bus 

Do interupt specific service routine translation logic which may be provided to couple devices 

User restores the state of the TMR using data format 800 to peripheral component interface bus 

^^storesacrestofVa state 26 which * fonnat 850 ' 

Execute tret/*TRET now enables TMR EN Referring now to FIG. 9, there is shown memory diagram 

register and does seq return */ 55 90O. Memory diagram 900 illustrates the possible relative 

ExittTap alignments of thirty-two bit data words in system virtual 

address space 130 and processor memory address space 150. 

Referring now to FIGS. 8A, B, there are shown data Thirty-two bit word 902 in processor memory address space 

format 800 and data format 850 which are permitted within 150, for example, may be aligned four different ways with 

two-bus interface system 10. In view of the direction in 60 respect to thirty-two bit words in system virtual address 

which they are accessed, these two data formats may be space 130. For example, it may be transferred without any 

described as: (1) most significant bit to least significant bit offset of the eight bit bytes A. B, C, D as shown in word 904. 

data format 800 and (2) least significant bit to most signifi- However, there may be a one byte offset, a two byte offset, 

cant byte data format 850. In data format 800, decoding or a three byte offset as shown in words 906, 908, 910, 

starts at the most significant bit of the least significant byte 65 respectively, within system virtual address space 130. 

of a thirty-two bit word. In data format 850. decoding starts Two-bus interface system 10 automatically aligns data 

at the least significant hit of the least significant byte. Thus, transferred from a system resource, such as local memory 



WEST 



5,640,528 



19 



20 



15 



42, to a local resource such as local memory 24, as well as 
data transferred from a local resource to a system resource 
aligned to the data type. This ability to write unaligned data 
and have it automatically aligned, for example by external 
hardware as previously described, makes more efficient use 
of interfaces 16, 28. The bits indicating the relative align- 
ment may be stored and determined within registers of 
two-bus interface system 10 in a manner similar to that 
previously described for indicating datatypes and other 
information, for example, it may be stored within processor 
300 or interfaces 16, 28. However, ft will be understood that 
this information may also be stored and determined by other 
methods known to those skilled in the art 

For example, in an alternate embodiment, information 
with respect to automatic alignment may be transmitted, for 
example, within data type field 416 within algorithm 400. 
The datatype information stored in tables 434a, b may thus 
be selected by the user according to the source and the 
destination, thereby providing the information required for 
automatic alignment. Thus, using the bits of field 416, it is 
possible to determine the alignment and to transfer data with 
the proper offset according to the destination indicated by, 
for example, page address 412. In the preferred embodiment 
of the invention, the bits of data type field 416 are trans- 
mitted by way of sideband or reserved signal lines of the bus 
protocoL 

It will be understood that this information is available 
within, for example, processor-to-system page translation 
table 120 during the execution of algorithm 400. Note mat 
determining these bits requires comparison of both the 
source and the destination addresses to determine the 
amount of offset 

It will be understood that various changes in the details, 
materials and arrangements of the parts which have been 
described and illustrated in order to explain the nature of this 
invention, ray be made by those skilled in the art without 
departing from the principle and scope of the invention as 
expressed in the following claims. 

We claim: 

1. An apparatus for translating a first address in a first 
address space to a second address in a second address space, 
comprising: 

(a) a processor; 

(b) a page table having a translation mask register, a 
comparison value register, and a replacement value 
register; and 

(c) a comparator coupled to the comparison value register 
and to the replacement value register; wherein: 
the first address comprises a first subaddress compris- 
ing a subset of the bits of the first address and a 
second subaddress comprising remaining bits of the 
first address; 

the first subaddress masked with a mask value stored in 
the mask register is compared by the comparator 
with successive comparison values stored in the 
comparison value register until a match comparison 
value is found; and 

a replacement value in the replacement value register 
corresponding to the match comparison value is 60 
concatenated with the second subaddress to provide 
the second address. 

2. The apparatus of claim 1. further comprising a proces- 
sor address space comparator for comparing the first address 
to a processor address space threshold, wherein the second 
address is equal to the first address if the first address is less 
than the threshold. 



3. The apparatus of claim 1, wherein, if the first subad- 
dress masked with the mask value in the mask register 
matches more than one comparison value in the comparison 
value register, then the first of the comparison values that 
match is used to determine the replacement value that is 
concatenated with the second subaddress to provide the 
second address. 

4. The apparatus of claim 1, wherein the first address 
space is a processor address space of the processor and the 
second address space is a system address space of a com- 
puter system with which the processor is associated. 

5. The apparatus of claim 1, wherein: 
the first address and second address are 32-bit addresses; 

and 

the mask register stores 20-bit masks which partition the 
first address into the first subaddress and the second 
subaddress. 

6. The apparatus of claim 5, wherein the first subaddress 
comprises a variable page address of 0 to 20 bits and the 

20 second subaddress comprises a variable offset of 12 to 32 
bits. 

7. The apparatus of claim 1, wherein: 
data having a data format is associated with the first 

25 address space; 

the comparison value register stores, for each of said 
successive comparison values, at least one data format 
type bit that indicates the data format of data associated 
with an address of the first address space that matches 
each successive comparison value; and 
the at least one data format type bit is concatenated with 
the replacement value in the replacement value register 
corresponding to the match comparison value and the 
second subaddress to provide the second address. 

8. The apparatus of claim 7, wherein the data format is one 
of: a least significant bit to most significant bit format, and 
a most significant bit to least significant bit format. 

9. A method for translating a first address in a first address 
space to a second address in a second address space, 
comprising the steps of: 

(a) providing a processor, a page table having a translation 
mask register, a comparison value register, and a 
replacement value register, and a comparator coupled 
to the comparison value register and to the replacement 
value register, 

(b) partitioning the first address into a first subaddress 
comprising a subset of the bits of the first address and 
a second subaddress comprising remaining bits of the 
first address; 

(c) comparing with the comparator the first subaddress 
masked with a mask value stored in the mask register 
with successive comparison values stored in the com- 
parison value register until a match comparison value is 
found; and 

(d) concatenating a replacement value in the replacement 
value register corresponding to the match comparison 
value with the second subaddress to provide the second 
address. 

10. The method of claim 9, further comprising the steps 
of: 

(e) comparing the first address to a processor address 
space threshold; and 

(f) if the first address is less than the threshold, then 
setting the second address equal to the first address. 

11. The method of claim 9, wherein, if the first subaddress 
masked with the mask value in the mask register matches 
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mare than one comparison value in the comparison value 
register, then the first of the comparison values that match is 
used to determine the replacement value that is concatenated 
with the second subaddress to provide the second address. 

12. The method of claim 9, wherein the first address space 5 
is a processor address space of the processor and the second 
address space is a system address space of a computer 
system with which the processor is associated. 

13. Hie method of claim 9, wherein: 

the first address and second address are 32-bit addresses; 10 
and 

the mask register stares 20-bit masks which partition the 
first address into the first subaddress and the second 
subaddress. 

14. The method of claim 13, wherein the first subaddress 15 
comprises a variable page address of 0 to 20 bits and the 
second subaddress comprises a variable offset of 12 to 32 
bits. 
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15. The method of claim 9, wherein: 

data having a data format is associated with the first 
address space; 

the comparison value register stores, fox each of said 
successive comparison values, at least one data format 
type bit that indicates the data format of data associated 
with an address of the first address space mat matches 
each successive comparison value; and 

the at least one data format type bit is concatenated with 
the replacement value in the replacement value register 
corresponding to the match comparison value and the 
second subaddress to provide the second address. 

16. The method of claim 15, wherein the data format is 
one of: a least significant bit to most significant bit format 
and a most significant bit to least significant bit format 

***** 
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